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Abstract — Loss tomography has received considerable atten- 
tion in recent years and a number of estimators have been 
proposed. Although most of the estimators claim to be the 
maximum likelihood estimators, the claim is only partially true 
since the maximum likelihood estimate can be obtained at most 
for a class of data sets. Unfortunately, few people are aware of 
this restriction that leads to a misconception that an estimator is 
applicable to all data sets as far as it returns a unique solution. 
To correct this, we in this paper point out the risk of this 
misconception and illustrate the inconsistency between data and 
model in the most influential estimators. To ensure the model 
used in estimation consistent with the data collected from an 
experiment, the data sets used in estimation are divided into 4 
classes according to the characteristics of observations. Based 
on the classification, the validity of an estimator is defined and 
the validity of the most influential estimators is evaluated. In 
addition, a number of estimators are proposed, one for a class of 
data sets that have been overlooked. Further, a general estimator 
is proposed that is applicable to all data classes. The discussion 
starts from the tree topology and end at the general topology. 

Index Terms — Applicability, Data-driven modeling, Intersec- 
tion, Partition, Loss tomography. 

I. Introduction 

Network measurement has received considerable attention 
in recent years since it can not only provide necessary informa- 
tion for network modeling but also verify some of the assump- 
tions made on ad hoc basis for network modeling. In contrast 
to direct measurement that is only suitable for a small network, 
network tomography is proposed for a large network 0. 
Network tomography, as a methodology, differs from direct 
measurement in many ways, and the most important one is the 
use of statistical inference to accomplish the task that involves 
probing, modeling, and estimation. As a new methodology, a 
large number of works have been carried out in recent years 
and the results reported so far cover loss tomography 0, 0, 
El, 0, ID, delay tomography 0, 0, 0, QO), El, loss 
pattern tomography [ 12 1, source-destination traffic matrix [ 13|, 
and shared congestion flows [14|. Despite the overwhelming 
enthusiasm and a wealth of publications in this area, network 
tomography is in its infancy and there is more work that 
needs to be done. In addition, some of the characteristics that 
seems to be fully investigated still require further studies. For 
instance, although loss tomography has been studied for more 
than 10 years and the likelihood equation of the tree topology 
has been available for over 10 years [2|, few are aware that the 
likelihood equation is restricted to a specific class of data. That 
is also true to almost all other estimators proposed in the past. 
Using those estimators without checking data type may result 
in unexpected consequence. This paper is devoted to address 
the validity of an estimator in the context of data classes and 
propose estimators for those unidentified data classes. 
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The fundamental principle of loss tomography is built on 
statistical inference, where parametric estimation is frequently 
used to fit a statistical model to data (observation), and then the 
maximum likelihood estimation is used to find the unknown 
parameters of the model. Unfortunately, the principle has been 
partially overlooked by almost all of the previous works. In 
addition, the few exceptions, such as |2| and |6|, that consider 
this issue take a different approach to handle this. Rather 
than fitting a model to data, the exceptions go in the opposite 
direction and try to fit data to a predefined model. If a data 
set does not fit the model, the data set is either discarded or 
skipped. Also, the discussions presented in 0, are far 
from complete, which fail to identify all unsuitable data sets 
and, of course, fail to propose alternative likelihood equations 
for the unidentified data sets. Furthermore, there has been 
no discussion of the validity of an estimator (a likelihood 
equation) in regard to data sets. Without knowing those, an 
estimator may be mistakenly used on a data set to that it 
does not fit and returns an incorrect estimate of the unknown 
parameter(s). The incorrect estimate can be very different 
from the correct one that invalidates the estimate from being 
used in traffic control or modeling. Apart from presenting the 
problems of previous works, we also provide solutions, where 
a number of estimators, one for a data class, are proposed in 
this paper to overcome the problems. More, a general estimator 
applicable to all type of data is proposed in this paper. More 
over, the discussion covers both tree and general topologies. 

Previous works on loss tomography were focused on 
proposing estimators and proving there is a unique solution 
from the proposed estimator. If an estimator cannot return 
a unique solution or cannot return a solution, the data set 
used in the estimation is considered inconsistent with the 
estimator and skipped or discarded 0. Although it is 
fundamentally important in terms of identifiability to make 
sure a unique solution from a likelihood equation, we must 
know that the likelihood equation is from a likelihood function 
and the likelihood function is from a data set. Only if the 
likelihood equation used in estimation fits the nature of the 
data set, is the unique solution obtained from the equation 
the maximum likelihood estimate (MLE), if the maximum 
likelihood principle is used in the estimation. To remedy this, 
data obtained from experiments are divided into 4 exclusive 
classes according to the characteristics of intersection and 
partition in observation. The 4 classes are called perfect, 
chained-only, partitioned, and chain partitioned, respectively. 
Each class requires a likelihood equation and the likelihood 
equations proposed in only suit to the data sets of 
the perfect class. Although the perfect class is the most likely 
scenario among the 4 classes, in particular when the number 
of probes sent to receivers approaches infinite, i.e. n — > oo, 
other scenarios do occur from time to time if n < oo. 



A. Contribution and Paper Organization 

Data consistency raised in |2| aims to eliminate 3 types of 
data from estimation since the likelihood equation proposed 
in that paper cannot find a solution for the 3 types of data. 
Nevertheless, [2| falls short of considering whether a unique 
solution returned by the likelihood equation is the MLE. In 
other words, the unique solution from a likelihood equation is 
only the necessary condition of the MLE, while the sufficient 
condition requires the likelihood equation fits the data set. To 
make this possible, we in this paper present the relationship 
between data and likelihood equations, and emphasize that 
MLE can only be obtained if and only if a likelihood equation 
fits to the data and there is a unique solution to the likelihood 
equation. The contribution of this paper can be divided into 
two parts: identify problems and find solutions, which are 
detailed as follows: 

1) To improve our understanding of statistical inference in 
loss tomography, this paper reiterates the importance 
of the statistical principle of fitting a model to data 
in the context of loss tomography. It points out that 
an estimator is only valid if the model used by the 
estimator fits the data collected from observation. It 
further examines the validity of the most influential 
estimators proposed for loss tomography and identifies 
the pitfalls of the estimators. 

2) To solve the problems, data used in estimation are 
divided into 4 classes on the basis of intersection and 
partition in the observation of descendants. The estima- 
tors proposed so far only fit to one of the 4 classes. 
Then, a number of estimators are proposed for the other 
3 classes that have been overlooked, including a general 
estimator that is able to handle all data classes. 

The rest of the paper is organized as follows. In Section 2 
we present the essential background, including the notations 
and statistics used in this paper. In Section 3, we present the 
problems existed in the most influential estimators in details. 
Section 4 provides the solutions to the problems presented in 
Section 3 for the tree topology. Section 5 extends the solutions 
obtained from the tree topology to the general topology. The 
last section is devoted to concluding remark. 

II. Notations and Related Works 

The two most influential works in loss tomography, one for 
the tree topology [2] and the other for the general topology 
0, are introduced in this section, where the latter is developed 
on top of the former. Because of the relation, both have the 
same restriction to the data used in estimation. 

A. Notation 

To assist the following discussion, the symbols used in 
this paper and their definitions are introduced briefly in this 
section. For those who wants to know the details, please refer 
to 0. 

To collect information from a large network, a number of 
probes are multicast from a number of sources located on 
one side of the network to a number of receivers located 



on the other side of the network. The paths from sources 
to receivers cover the links of interest. If there is only a 
single source, the paths from the source to receiver forms a 
special tree, called multicast tree, where the root only has 
a child. Let T = (V, E, 9) donate the multicast tree, where 
V = {vo, Vi, ...v m } is a set of nodes representing routers and 
switches of a network; E — {ei, ...,e m } is a set of directed 
links connecting the nodes of V; and 9 = {9i,...,0 m } is an 
m-element vector, one for a link to describe the loss rate of 
the link. R is used to denote all receivers. As a hierarchical 
structure, each node in a tree except the root has a parent. 
Each node except leaf ones has a number of descendants. 
Let di denote the descendants of node i and | d,; | denote the 
number of descendants in di. Further, each multicast subtree 
is named by the number assigned to the child node of the 
root, where T(i) = {V(i),E(i),9(i)},i G {l,--,m} denotes 
the multicast subtree rooted at node f(i), where V(i), E(i) 
and 9(i) are the nodes, links and parameters of the subtree. 
Note that a multicast subtree is different from an ordinary 
one, where multicast subtree i is rooted at node f(i) that uses 
link i to connect subtree i. The group of receivers attached to 
T(i) is denoted by R(i). If n probes are dispatched from the 
source, each probe i = l,....,n gives rise of an independent 
realization X^' of the loss process X, X l k = l,k € E if 
probe i passes link fc; otherwise X l k = 0. The observation of 
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(n k ),k G V\0 and fl k = (.XT') A"" ,n > 3 G R ( k ) 
comprise the data set for inference. Therefore, observations 
are also called data or data sets in the following discussion. 

To estimate the loss rates of a tree topology, a set of suffi- 
cient statistics is introduced in |5| one for a node to denote the 
number of probes reaching node i confirmed from observation 
of R(i), rii(l),i G V, In addition, let mj(l),i,j G d k denote 
the number of probes that are observed simultaneously by 
at least one receiver attached to subtree i and at least one 
receiver attached to subtree j. Similarly, mji(l),i,j,l G d k 
denotes the number of probes observed simultaneously by 
the receivers attached to subtrees i,j, and I. Furthermore, we 
can have statistics to count the number of probes observed 
simultaneously by more descendants. This process continues 
until all descendants are included, i.e., rid fc (l). 

Given d k , we have a c-algebra M on d k , and a mea- 
sure / on M. Then, a measurable space, (d k ,M,I), is 
established for each node to obtain the statistics used in 
estimation and to divide observations into classes, where 

I( x ) = Z)»ei,..,n /\jex Xj > * e ^ counts the number of 
probes observed simultaneously by the members of x. Note 
thatn fc (l) = J(»),#(a!) = lf\x€M\®. 

In contrast to the tree topology, there are multiple inter- 
sected trees in a general network. Then, the nodes located in 
a shared area, called shared segment, can observe probes sent 
by multiple sources. To accommodate multiple sources, the 
notations defined above need to be extended to consider the 
sources. Therefore, an extra symbol in most cases is added 
to the corresponding notations defined above to represent the 
source. For instance, n,(s,l) denotes the number of probes 
sent by source s passing link i, Ofe(s) denotes the observation 
of R(k) for the probes sent by source s, where n s is the 
number of probes sent by s. Note that link instead of node is 



used as the reference in the general topology since there is no 
longer 1-to-l mapping between nodes and links in the general 
topology. 

B. Related Works 

Multicast Inference of Network Characters (MINC) is the 
pioneer of using multicast probes to create correlated obser- 
vations at the receivers of the tree topology |2], lfl5l . Ifl6l . 
where a Bernoulli model is used to describe the loss process 
of a link. Using this model, the authors of [2] derive a direct 
expression of the pass rate of a path connecting the source to 
an internal node as follows: 
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replace 7^ and ay, (QJ becomes a single-variable polynomial 

that has \dk\ — 1 roots according to the fundamental theorem 

of algebra. However, we are only interested in the roots falling 

into the support of Ak, i.e.(0, 1). The lemma 1 of [2| proves 

there is a unique solution to (QJ in the support of Ak if 

Tljed a j > 0- in addition, three extreme cases are identified 

and ruled out from estimation since there is no solution to the 

likelihood equation if the data set used for estimation falls into 

the three cases. 

Recently, Zhu proposes an analytical solution to the general 

topology [6|, where the likelihood equation is as follows: 
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where S(i) is the set of sources sending probes to node i and k 
is one of the sources. It is also proved that (QJ is a special case 
of (|2l). As [2], |6| also discusses the data consistent problem in 
the same line as its predecessor for the general topology. In the 
paper, Zhu points out the difference between the tree topology 
and the general one in terms of data consistency. Despite this, 
the paper as its predecessor treats the data falling into the 3 
types as exceptions and eliminate them from estimation. 

Unfortunately, both, [2| and [6], fail to go a step further to 
consider whether the unique solution returned from (QJ or (O 
for a data set is always the MLE. 

III. Problem Formulation 

As stated, previous works fail to consider the impact of data 
on likelihood equations. To illustrate the impact, we examine 
the likelihood equations proposed in |2| and [6| with imperfect 
data to calibrate their validity in this section. 

A. Statistical Implication 

Given observation 17, a likelihood function P(17|6) is con- 
structed as a probability measurement where 9 is a variable. 
The maximum likelihood principle proposed by Fisher aims 
to find the 9 that can maximize P(!7|0). The structure of 
the likelihood function depends on 17, so does the likelihood 
equation since it is derived from the likelihood function. Thus, 



the likelihood equation as a statistical model connects some 
random variables to the others and expresses the relation 
between the variables. The relation can be analyzed on the 
basis of matching a model to data. Taking (QJ as an example, 
the both sides of the equation denotes the loss rate of subtree 
k, where the left hand side (LHS) uses the data obtained from 
observation directly to express the loss rate of subtree k while 
the right hand side (RHS) uses the probability reasoning to 
achieve the same. The LHS can be viewed as the data and 
the RHS as the model. The correspondence between data and 
model becomes obvious if we expand the both sides of ((TJ, 
where the LHS is as follows: 
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which is constructed from 17,%. It is easy to prove 
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> and the values of the terms on the 
RHS of (01 monotonically decrease from left to right, i.e. 
a term in a left summation is larger than a term in a right 
summation whose subscript has one more number than its left 
correspondent. Thus, J^jed ®3 — 1- 

In contrast to the LHS, the RHS of ([TJ is built on the 
frequentist view that expresses the loss rate of subtree k by 
the product of the loss rates of the subtrees rooted at node k. 
Expanding the RHS, we have 
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Deducting 1 and multiplying Ak on both © and (f5]), one is 
able to notice the 1-to-l correspondence between the terms of 
d3j and that of (O. The correspondence reflects that the MLE 
can only be achieved if the RHS (model) matches the LHS 
(data). 

The above discussion unveils that if the LHS of (Q} matches 
the RHS term by term, the solution obtained from the equation 
is the MLE. The model used by ([TJ is based on the assumption 
that every term of (TJ) exists. Note that matching the RHS to the 
LHS, term by term, is also the condition that ([TJ holds. With 
the increase of probes, the variation of the estimate decreases 
according to Fisher Information. This is also reflected on the 
corresponding terms. As n — > 00, one can even use a single 
pair of the correspondences to form an explicit estimator. For 
instance, the estimator proposed in ifTTj is based on the last 
pair of the correspondences. However, if n < 00, some terms 
on the LHS may not exist, and then we must consider: 

1) whether ([TJ is still the likelihood equation of the data 
set? and 

2) whether the unique solution obtained from (QJ is the 
MLE? 



B. No Solution because of Invalidity 

Recall lemma 1 of 12 that states there is a unique solution 
to ([TJ in the support of A k if X^ed a j > •"•• This condition 
in practice means that there is at least an intersection in the 
observations of the descendants of node k, mathematically the 
condition can be written as 3x, y; I({x, y}) > 0, x, y £ M\0. 
Note that lemma 1 does not ensure the solution is the MLE 
of A k . 

In contrast, if 
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there must have \/x,y;I({x,y}) = 0, x,y £ M\ 0, We call 
this complete mutual exclusion at node k. Once the complete 
mutual exclusion occurs at node k, there is no correlated 
information about A k in observation. Thus, lemma 1 of (2] 
concludes there is no solution to (Q]). This can be explained 
either by only considering equation (QJ or by considering the 
validity of equation (Q]i. [2] takes the former and considers (Q3 
a concave function that does not intersect with the axis A k in 
(0,1). We take the latter and consider if the complete mutual 
exclusion occurs at node k, the loss rate of subtree k is equal 
to l — V ._ j —r- instead of FT ._ . (I — • a,). This means 

that given the complete mutual exclusion, ([TJ no longer holds, 
let alone a solution. 



C. Incorrect Solution because of Invalidity 

As stated, if n < oo or n -C oo, some of the terms on the 
LHS of (fl]i may not exist, however, their counterparts on the 
RHS do as long as 7, ^ 0,i £ d k . If so, there is at least a 
mismatch between the LHS and the RHS. Then, the unique 
solution obtained from CQ) may not be the MLE. We call this 
partial mutual exclusion, mathematically 

3x,I(x) =0;xe M \0. 

As the complete mutual exclusion, (Q]i does not hold if a partial 
mutual exclusion occurs in observation. For instance, assume 
node k has 3 descendants, a, b, and c, I({a, b}) — n a (,(l) > 0, 
I{{a, c}) = 0, I({b, c}) = 0, and I({a, b, c}) = 0. Putting the 
available information into the expansion of |[T), we have 
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Although there is a unique solution to ©, the solution is 
certainly not the MLE of A k since the data and model 
are obviously mismatched. In fact, the observation of R(c) 
does not provide any information for A k and should not be 
considered. If we remove the terms related to subtree c, we 
have 

n a b{l) _ n a (l)n b (l) 



A k 



Observation Class 


partition 


chain 


perfect 








chained-only 





l 


partition only 


l 





chained partition 


l 


l 


TAB 


LEI 





where the model fits the data. 



Classification of Observations 



As the previous subsection, (fl]i is no longer valid if there 
is a mutual exclusion in observation. Then, the following 
questions are emerged: 

1) how many types of exclusions exist in observation? 

2) is there an estimator applicable to all types of observa- 
tions? 

We will address those issues in the next section. 



IV. Data Classification and Solutions 
A. Classification of Data Set 

The discussion presented in the last section unveils the 
impact of observations on estimation and details the incom- 
pletion of previous works on loss tomography. Considering 
the variation in observation, we propose a new strategy in 
loss tomography to match a model to data. To make this 
possible, we divide the data sets used in estimation into a 
number of classes on the basis of intersection and partition in 
the observations of descendants and introduces a number of 
models, one for a class of data. The classification is presented 
in Table IIV-AI where 

• the perfect class denotes the data sets that satisfies the 
follows: 

Vx,I(x) >0,x£M\9; 

• the chained-only is for the data sets that are not in 
the perfect class, but the observations of the descen- 
dants cannot be divided into two exclusive groups, i.e. 
3x, y, I({x, y}) = 0, x, y £ M \ and 

if (I({x, y}) = 0, x, y £ M \ 0) then 

3z, I({x, z}) > P| I({y, z}) >fl,ze M; 

« in contrast to the chain-only, the partition only is for 
such mutual exclusions that the observation of R(k) can 
be divided into a number of exclusive partitions and at 
least one partition has more than 2 descendants. Within 
a multi-descendants partition, the observation is perfect. 

• the chained partition class is for the data sets that combine 
the characteristics of the above two classes, i.e. the 
observation can be divided into a number of exclusive 
partitions and at least one partition has more than 2 
members, and in a multi-member partition, its observation 
falls into the chained class. 

Figure Q] illustrates the four classes, where each circle is for 
the observation of a descendant. Among the subfigures, (a) is 
for the perfect class, (b) for the chained-only class, (c) for the 
partition class, and (d) for the chained partition class. 




(a) 



(b) 




(c) 



Fig. 1. Data Classes 
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I({kl, fc2}) > 0, it is easy to prove there is a unique solution 
and the solution can be obtained analytically. On the other 
hand, if there is a unique solution to (JTJ, the same solution 
should be obtained from no matter how the two groups 
are constructed. This requires \/x,I(x) > 0,x G M \ 0. If 
3x, I(x) = 0, x G M \ 0, the x can be selected as one of 
the two groups. However, the x itself is not in the perfect 
class. Thus, the assumption made previously does not hold, 
and neither does (0. The following corollary provides the 
detail. 

Corollary l.Ifa data set belongs to the perfect class, the 
maximum likelihood estimate can be obtained from (0. 



B. Prefect Observation 

As stated, most of the likelihood equations proposed previ- 
ously do not consider the variation of observations. With the 
introduction of data classes, the likelihood equations proposed 
in the past need to be calibrated to find their applicability in 
regard to the data classes. The equations that are of concern in 
this paper is (0 for the tree topology and (0 for the general 
topology because they are the most influential one in each of 
the topologies. The following theorem provides the validity of 

CD. 

Theorem 1. The estimate obtained from is the MLE iff 

the data used in estimation falls into the perfect class. 

Proof: The estimate obtained from (0 has been proved to 
be the MLE Q, where mutual exclusions in observations have 
not been considered. On the basis of the analysis presented in 
the last section, the RHS of (0 contains all possible terms of 
correlations, from pairs of descendants to the product of all 
descendants. This requires the LHS to have all correspondent 
terms to match the RHS. Therefore, there must not have 
any form of mutual exclusion in observation. On the other 
hand, if there is a mutual exclusion in observation between 
the observations of siblings, (0 no longer holds. Then, the 
theorem follows. ■ 

Another estimator is proposed recently in [5] to tackle 
the use of approximation to find the solution of if a 
node has 5 or more descendants. [5| proposes an equivalent 
transformation to turn (0 into a linear function by merging 
the descendants into two groups. The transformation takes 
advantage of the self-similarity of 0, and derives the follows: 
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where fcl and fc2 denote the two groups which satisfy 
dk = kl\Jk2 and klf]k2 = 0. Further, fcl and fc2 can 
be considered two virtual descendants of node k and each 
connects the descendants of its group. Note that the derivation 
of takes advantage of theorem where it is assumed the 
observations of fcl and fc2 to fall into the perfect class. If 



C. Chained Only 

As stated, is only valid to the perfect data sets. If the 
data set obtained from an experiment falls into the chained- 
only class, a new likelihood equation is needed that can be 
obtained by removing some of the terms from the RHS of (0 
that correspond I(x) = 0, x G M\0. Let m e , m e G M denote 
the set of the terms that need to be removed; if j G m e , \j\ 
denotes the number of descendants involved in the term. Then, 
the likelihood equation takes the following form 
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where the summation on the RHS is for the terms that need 
to be removed from the first term on the RHS. 

Given (0 as the likelihood equation, the next question is 
whether there is a unique solution to it in the support of A k . 
The following lemma answers the question. 

Lemma 1. Let C be the set of c — (cj)j=i,2 d k with 

c-i G (0, 1) and ^ li c, > 1. The equation ofl—x = Yiied (1 — 
Cix) — Y^jem(~ x ) Yiiej c i nas a un iq ue solution x(c) G 
(0, 1) if the summation term is a part of the product one. 
Moreover, x(c) is continuously differentiable on C. 

Proof: See appendix ■ 

The lemma extends from lemma 1 of |0, (0, as (0, is a 

polynomial with a degree that is lower than that of since at 

least rid k {l) = 0. Using the lemma, we can prove the solution 

to © is the MLE. 

If the chained observations of flj , j G d k can be divided 
into two exclusive groups, fcl and fc2, where the observations 
of fcl fall into the perfect class, and the observations of fc2 
are exclusive partitioned, i.e. there is no intersection in the 
observations of any two descendants of fc2. In addition, the 
observation of each descendant of fc2 is intersected with all of 
fcl, (0 can be used here to obtain the MLE since . For instance, 
if node k has 3 descendants, a, b, and c, the observations 
of the 3 descendants belong to the chained-only class, where 
I{{a, b}) > 0, I({a,c}) = 0, and I({b,c}) > 0. If a and c 
are in fc2 and b is in fcl, we have the MLE from (0, where 

n ab (l) + n bc (l) _ (n a (l) + rt e (l))n c (l) 
n n 2 A 

Hence, merging the descendants having exclusive observations 
into fc2 is equal to remove those terms that do not occur in 



the data part from the model part. Since each descendants in 
fc2 is intersected with every one of fel, timing the statistic of 
fc2 to that of fcl maintains the correspondences between data 
and model. 

The above discussion shows that by appropriate grouping, 
(0 can obtain the MLE for some of the data sets falling 
into chained class. However, if a given data set cannot be 
divided into two groups as above, the estimate obtained by 
(0 is not the MLE. Despite this, the estimate obtained by 
(0 is still a little better than that obtained by (0. Using the 
previous example and assume I({a,b}) > 0, I({a,c}) = 0, 
and I({b, c}) = 0. In this case, the observations cannot be 
merged into two groups as above. If we merge b and c, and 
putting the statistics into the expansion of 0, we have 

n ab (l) _ n a (l)n b (l) n a (l)n c (l) 
n n 2 ■ A k n 2 ■ Ak 

As (O, there is an inconsistence between the LHS and the 
RHS of (0. Then, the estimate obtained from the equation is 
not the MLE either despite that the error here is smaller than 
that of (0. On the other hand, if merging a and b, we cannot 
even have a solution since (0 fails to hold. 

D. Partition-Only 

For the simplicity reason, the discussion is started from the 
observation of R(k) that consists of two exclusive partitions, 
and then the discussion is extended to multiple partitions. 

Let fcl and fc2 be the two partitions. Thus, the LHS of 
the likelihood equation is equal to 1 minus the sum of the 
pass rates of fcl and fc2 since their observations are mutual 
exclusive, and the RHS of the equation according to (0 is 
equal to deducting those terms that involve the members of 
the two partitions from the terms of the perfect class. The 
likelihood equation is given in theorem [2] 

Theorem 2. For a network of the tree topology, if the 
observation of R(k) is partitioned into two exclusive parts, 
the likelihood equation of the observation is as follows: 
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Proof: Let ME denote the terms that need to be removed 
from because of the mutual exclusion. Then, according to 
(0, we have 
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Removing ME is equivalent to divide the descendants into 
two groups according to the mutual exclusion, and add the 
terms of one group into another. Rearranging the terms of the 
above, we have (fTOt . ■ 

Given ( fTUl ), we have a polynomial with the degree one 
less than that of the number of descendants in the larger 
exclusive group. If the degree is larger or equal to 5, there 
is no closed form solution to the polynomial unless some of 



the descendants can be merged when we estimate the pass rate 
from the source to their parent. Fortunately, this is achievable 
since the observation of each group is perfect. Then, (0 can 
be used on each partition to turn ( TTOb into a linear equation. 
Then, a closed form solution follows. 

If the observation of node k, il^, is divided into Q exclusive 
partitions (Q>2), we have the following theorem for its 
likelihood equation. 

Theorem 3. If the observations of R(k) are divided into 
Q partitions, the likelihood equation is 
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Proof: If the observation of R(k) is divided into Q exclu- 
sive partitions, it is equivalent to have Q independent subtrees 
connected to node k and the pass rate of subtree k is equal to 
the sum of the pass rates of the Q independent subtrees. For 
each of the subtrees, there is a likelihood equation as 
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jed kj 

ru-(l) 
where ay = ^^^ y-r. 

Because of the independence in the observation of the Q 
partitions, the likelihood equation is equal to the sum of the 
likelihood equations of the Q independent ones. Then, the 
theorem follows. 

We can also prove ( fTTT i from the likelihood function con- 
structed directly from the observation. Given the partitioned 
data set, the log-likelihood function of the observation in 
respect to Ak can be written as 

L(A k ) = J2 K-j(l) lo § A k + nfcj(0) log(l - A h p kJ )] 
je{i,..,Q} 

Differentiating it with respect to A k and letting the derivative 
0, we have 



dL(A k ) 
dA k 
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Solving it, we have 
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Then, we have 
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Note that if the observation of kj falls into the perfect class, 
we have 

p kj = i- n(i-^-),jc{i,..,Q} 

v-w A k 

]£a kj 



Using the above to substitute /3jy from ( fT3T > and rearranging 
the terms afterwards, we have (fTTT i. 

■ 
Since dT2b is a concave function, ( fTTT i is a concave function 
because ( ITTb is a sum of (TT2b . In addition, there is a common 
support for each of the member functions of ( fTTT i. Then, (fTTT i 
has a maximum point in (0, 1) that can be obtained directly 
by 



A k = 
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where (0 is used in each of the exclusive partitions to divide 
the descendants into two groups and merge their statistics, 
where nki(l), i 6 {T2}, is the number of probes that 
observed simultaneously by the receivers of the two groups 
of the ith exclusive partion. In addition, 7^ and jk i2 are the 
empirical pass rates from the source to the two groups of the 
ith exclusive partition, respectively. 

E. Chained Partition 

As defined, a data set falling in this class can be divided 
into a number of exclusive partitions, each partition consists 
of the observation of a number of descendants. In addition, 
the observation of a partition that has more than 2 members 
is not in the perfect class but chained. Thus, a new likelihood 
equations is needed for this class of data and the equation 
must combine the feature of the likelihood equations proposed 
for the chained-only and partition only classes. Since the 
observation of a partition is not in the perfect class, the 
likelihood equation for the partition is in the form of (|S). If 
there are Q (Q > 1) partitions, since the observations between 
partitions are exclusive, the likelihood equation for this class 
is in the form of 
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As before, we can prove there is a unique solution to (fT~4-b 
since (fl~4-b is a sum of concave functions. Unfortunately, there 
is no a closed form solution to (fT4T l at this moment. 

F. Complete Mutual Exclusion 

Given Theorem [3] we know if there is no intersection 
between the observations of the receivers attached to the 
subtrees rooted at node k, the observations of the descendants 
do not provide any information about the path connecting 
the source to their parent. Thus, there is no need to add the 
correlation between them into the model part of (fTJ. The RHS 
of (fTJ is equal to 

iedfc 



Using empirical probability jj 



71,(1) 



to replace 7, from 



To solve the problem, we can either sending more probes 
to break the tie or use bootstraps to produce some synthetical 
probes to create intersections. Then, we can use an appropriate 
likelihood equation to estimate A k . 

G. Independent Path 

During the presentation, we always assume that either there 
are at least 2 descendants in a partition or at least there is 
a partition that has 2 descendants of node k. Without this 
assumption, a data set may be in the class of complete mutual 
exclusion. Under this assumption, if each partition has more 
than 2 descendants, the pass rate from the source to the parent 
of the descendants can be estimated independently and the 
total pass rate is equal to the sum of the pass rates of the 
partitions. 

Although using Theorem [3] we are able to obtain the same 
solution as that obtained from (fTJ and revive the estimator 
based on the equivalent transformation. A new question is 
emerged that is whether an estimator needs to consider a 
partition that only has a single descendant. The following 
theorem provides the answer to this question. 

Theorem 4. The observation of single-descendant groups 
has no impact on the estimation of the pass rate of the path 
connection the source to the parent of the descendant. 

Proof: According to the condition, ( fTTT i is the likelihood 
equation fitting the data. For this case, a number of identical 
terms can occur in the summation of the LHS and RHS of the 
equation, one for the loss rate of a single-descendant group. 
Those terms cancel each other without effect to the estimation. 
Then, the theorem follows. ■ 

For instance, in the previous example descendant c is 
independent from a and b. Based on theorem @] c is removed 
from estimation and we have 

rzafc(l) _ n a (l)n b (l) 



Using lemma |2j we have 
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(fT5T l. (fTJ is collapsed and there is no solution of course. 



nA k 

Simplifying the above, we have (I161 l. Theorem [4] also explains 
why there is no solution for the data set falling into the com- 
plete mutual exclusive group. This is because the observation 
of each descendant has its own partition, and then each of 
them is canceled from (fTTT i that leads to the collapsed of ( fTTT ). 

V. Multi-sources 

In the general topology, a node may have more than one 
parents, thus a node may observe probes sent by multiple 
sources. Because of this, estimating the pass rate of a link must 
consider the probes sent from all related sources regardless 
the probes pass the link of interest or only pass its ancestors. 
Therefore, (O is the likelihood equation of A(s, i) for a path 
connecting source s to node i regardless node i is a joint 
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node or not, where a joint node is such a node that has more 
than one parents [6|. The difference between (fZ|i and (JTJ is 
at those nodes that can receive probes from multiple sources; 
where the former considers all probes sent by related sources 
to node f{j) and uses 

E se s(i)«i(M) 

Q/ ■ == _ 

3 Esesw'VG)^ 1 ) 
as the empirical pass rate of link j, while the latter has 

d . - aw 

Note that in $1% i is the parent node of link j and S(i) is the 
set of sources sending probes to node i. Despite the differences 
between the likelihood equations of the tree and the general 
topologies, both take into account all probes reaching the end 
node of the path of interest. More importantly, the principle of 
fitting a model to data becomes more obvious in the general 
topology than that in the tree topology. If we consider the 
both sides of (J3J data and model, respectively, the LHS of ((2j 
as the data is from the observation of a single source, called 
individual observation; while the RHS as the model considers 
the probes sent by multiple sources, called global observation. 
As previously proved, the MLE can be obtained if and only 
if the RHS matches the LHS and there is a unique solution to 

Considering fitting a model to data in the general topology, 
we can use the same classification as those defined in the tree 
topology to divide data sets into 4 classes. Since the nodes 
in a general topology can have multiple parents, even a node 
that has only a parent can have multiple ancestors located on 
different paths to the node, the nodes are divided into 3 types: 
single parent and single source nodes (single node), multiple 
parents nodes (joint nodes), and single parent and multiple 
sources nodes (shared nodes). For the single parent and single 
source nodes, the methods proposed for the tree topology can 
be used to estimate the loss rate of the path connecting the 
source to the node given the loss rates of the subtrees rooted 
from the node, in particular if there are shared segments in 
the subtree. Our focus is on the joint nodes because single 
nodes need to know the pass rate of the shared subtrees and 
a shared node can be considered a special joint node. The 
difference between a joint node and a shared one is on the 
paths connecting the sources to the node, where the former 
has distinguished paths and the latter has a shared part of the 
paths. Therefore, they can be handled in the same manner 
in terms of estimation. In addition, if we use the divide-and- 
conquer approach proposed in |5| to divide a general topology 
into a number of trees, there is no need to consider the shared 
nodes separately. 

A. Joint Node 

For a joint node, say i, there are up to \S(i)\ likelihood 
equations, one for a path connecting a source to the node. 
Since all the paths connect to an ordinary subtree rooted at 
node i, we need to have a unique pass rate for the subtree 
that can maximize the likelihood function constructed from 
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observation. As previously analyzed, each of the likelihood 
equations is determined by observation, (ffj) as (Q3 holds if and 
only if observation is in the perfect class, i.e. \/j, flj(sj), Sj £ 
S(i) is in the perfect class. In this case, the \S(i)\ likelihood 
equations are equivalent to each other in regard to the shared 
subtree and the LHS of (O matches its RHS. Knowing the 
pass rate of one of the paths, say from Sk, the pass rate of 
another path, say from Sj, can be obtained easily by 

A(Sj,i) = ^A(S k ,i) 

since the \S(i)\ likelihood equations are in the form of 1 — X = 
n ed (1 — cjx). Given the fact that subtree i is a common 
part of the paths from the sources of S(i) to R(i), the pass 
rates of the paths from the sources to node i are proportional 
to the pass rates from the source to R(i). 

If 3Qi(Sj),Sj £ S(i) is not in the perfect class, the 
situation becomes more complex than that of the tree topology 
and needs to be analyzed further. To assist the following 
discussion, we divide the observations of a shared subtree into 
4 classes on the basis of individual and global observation and 
present them in Table IV-AI The global observation of node i 
is defined as 

Qi = [J fii(sj). 

Sj£S(i) 

Apart from the (perfect, perfect) class, we need to consider 
data consistency again in a different way from those defined in 
[2 1 and [5 1. The previous concern is focused on the consistency 
between data and model and the approach used is to eliminate 
those data sets that is inconsistent with the model. Here, the 
consistency has been extended to consider the consistency 
between individual observations, and the consistency between 
an individual observation and a global one. With multiple 
sources sending probes to receivers, each source creates its 
own individual observation that is the view of the source on 
the shared segment. The views created by different sources can 
be different from each other although when n s — > oo, s £ S(i) 
the same views are expected. However, when n s < oo, 
different views may occur that make estimation impossible 
since there is a lack of a consistent model. 

For the data set in the (others, perfect) class, although the 
data created by different sources compensate each other to cre- 
ate a perfect view, the individual data are not consistent with 
the global one that implies different models for the likelihood 
equations. Because of this, estimation cannot proceed and we 
need to send more probes until the data falls into (perfect, 
perfect) class or skip the estimation. This also apply to the 
data of (others, others) class since if Cli(Si) is not consistent 
with Qi(Sj), the \S(i)\ likelihood equations are different from 
one another. 



The (others, identical others ) class is designated to the 
observation: Vf2j(sj), Sj G S(i) are identical in terms of 
correlation. Hence, if the individual observations are identical 
to the global one in terms of correlation, a model that is 
consistent with the data can be created, so does a consistent 
likelihood equation for every source. This extends (O to cover 
imperfect data in some degree and the following theorem 
presents the likelihood equation for the pass rate of a shared 
subtree. 

Theorem 5. Given data in the (others, identical others ) 
class, the likelihood equation of the pass rate of the shared 
subtree is as follows: 

\S(i)\(l -x) = \S(i)\ Y[ (1 - otjx) - Yl me j( x ) ( 18 > 
jed, jedi 

where aj is the ratio between the number of probes reaching 
j, j G di and the number of probes reaching node i as ftl7\) 
and mej (x) as defined in the tree topology, denotes the terms 
that need to be removed from likelihood equation j (see proof 
for detail). 



Proof: Let x = 
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that is the pass rate of the shared 



subtree. If f2j is not perfect, the corresponding terms on the 
RHS of (O should be removed as those discussed in the tree 
topology. Let the terms be mej (x) for Sj that is a function of 
x. We have a likelihood equation as follows for each source: 
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oijx) — mej(x). 



There are \S(i)\ equations as above, one for a source. Adding 
the equations together, we have the theorem. ■ 

As previous, we are able to prove the solution space is 
concave and there is a unique solution in the support of the 
pass rate. Then, the solution is the MLE of the pass rate of 
the shared subtree. 

Given the pass rate x, the loss rate of the path connecting 

li 
a source to node i can be obtained directly by — since 

< fT~8T > is a polynomial, if its degree is 5 or higher, there is 
a lack of explicit methods so far to solve ( fTST l other than 
approximation. To minimize the use of approximation, we can 
use the divide-and-conquer approach proposed in J6) here to 
break a general network into a number of trees, where (ffj 
or ([T8l is used on each joint node to have the MLEs of the 
number of probes reaching the joint node. With the numbers, 
a general network can be divided into a number of trees and 
the methods proposed in the previous section can be used to 
obtain the MLE of each path. 

VI. Conclusion 

Loss tomography is built on statistical inference that re- 
quires a correct model to describe the observation received 
from an experiment. The model must match the nature of the 
data. Nevertheless, the dependency of a model on a set of data 
has been either overlooked or misunderstood that leads to a 
misconception that an estimator is applicable to all sorts of 
data sets as far as it returns a unique solution. Within this 
paper we attempt to correct this and consider the validity of 



an estimator that demands a match between data and model 
in estimation. We then revisit two of the most influential 
estimators proposed previously and find that they, as those 
estimators proposed previously, at most are the maximum 
likelihood estimator for a type of data only. To overcome this, 
fitting a model to data has been emphasized in this paper 
and the necessary and sufficient condition of the maximum 
likelihood estimator is presented in this paper that require 1) 
a likelihood equation matching a model to the data; and 2) 
a unique solution to the likelihood equation. The necessary 
and sufficient conditions indicates that in order to obtain a 
MLE, we need to use all available information in observation, 
eliminate redundant information, and match a model to the 
data. To generalize the results, data obtained from experiments 
are divided into 4 classes, and 4 likelihood equations are 
presented in this paper, one for a data class. Apart from the tree 
topology, this issue is also considered for the general topology, 
where data consistency has been extended to consider the 
difference between individual views and between an individual 
view and the global view. The connection and relation between 
them have been analyzed and an estimator is proposed for the 
case of identical individual views. 
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1 - x, h 2 (x) = J7i(l - fyx), and 



x u ' rieej Ce ' we h ave h'i( x ) = ~ 1' ^(x) : 
and ti 3 {x) = Ejgm. \j\ * * UI_1 rie6j c 
we have h 1 (x) — 0, h 2 (x) 

IjXIj'I- 



MaO[(£i<z0 2 -£igf] > 0, and h 3 (x) = £ 
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l)x lil_1 n e ei c e < 0, if a; G [0, 1]. Note that h 3 (x) is a small 
part of fi2(x) that have two or more a, i G du timed together. 
Let h(x) — hi(x) — h 2 (x) + h 3 (x), that is strictly concave on 
[0, 1]. ■ 
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